stochastic gradient hamiltonian monte carlo
Bayesian Optimization with Robust Bayesian Neural Networks
Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter
Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation.
Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples. To efficiently optimize the hyperparameters, we introduce the Moving Window MCEM algorithm.
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles
Millard, Andrew, Zhao, Zheng, Murphy, Joshua, Maskell, Simon
Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) proposals into SMC, enabling efficient mini-batch based sampling. Our resulting SMCSGHMC algorithm outperforms standard stochastic gradient descent (SGD) and deep ensembles across image classification, out-of-distribution (OOD) detection, and transfer learning tasks. We further show that SMCSGHMC mitigates overfitting and improves calibration, providing a flexible, scalable pathway for converting pretrained neural networks into well-calibrated Bayesian models.
Reviews: Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Update after rebuttal: I think the rebuttal is fair. It is very reassuring that pseudocode will be provided to the readers. I therefore keep my decision unchanged. Original review: In the paper "Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo" the author(s) consider the problem of inference for deep gaussian processes (DGPs). Given the large number of layers and width of each layer, direct inference is computaitonal infeasible, which has motivated numerous variational inference methods to approximate the posterior distribution, for example doubly stochastic variational inference (DSVI) of [Salimbeni and Deisenroth, 2017] The authors argue that these unimodal approximations are typically poor given the multimodal and non-Gaussian nature of the posterior.
Bayesian Optimization with Robust Bayesian Neural Networks
Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation.
Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Havasi, Marton, Hernández-Lobato, José Miguel, Murillo-Fuentes, Juan José
Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples.
Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization
Akyildiz, Ömer Deniz, Sabanis, Sotirios
This problem arises in many cases in machine learning, most notably in large-scale (mini-batch) Bayesian inference (Welling and Teh, 2011, Ahn et al., 2012) and nonconvex stochastic optimization (Raginsky et al., 2017). For the setting of Bayesian inference, one is interested in sampling from a posterior probability measure where U corresponds to the sum of the log-likelihood and the log-prior. For the nonconvex optimization, U(·) is the nonconvex cost function to be minimized. For large values ofβ, a sample from the target measure (1) is an approximate minimizer of the potential U (Raginsky et al., 2017). Consequently, nonasymptotic error bounds for the schemes, which are designed to sample from (1), can be used to obtain guarantees for Bayesian inference or nonconvex optimization. Sampling from a measure of the form (1) is also central in statistical physics (Binder et al., 1993), most notably in molecular dynamics Haile (1992).
Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Learning in the Big Data Regime
Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is a momentum version of stochastic gradient descent with properly injected Gaussian noise to find a global minimum. In this paper, non-asymptotic convergence analysis of SGHMC is given in the context of non-convex optimization, where subsampling techniques are used over an i.i.d dataset for gradient updates. Our results complement those of [RRT17] and improve on those of [GGZ18].
Parallel-tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling
Luo, Rui, Zhang, Qiang, Liu, Yuanyuan
We propose a new sampler that integrates the protocol of parallel tempering with the Nos\'e-Hoover (NH) dynamics. The proposed method can efficiently draw representative samples from complex posterior distributions with multiple isolated modes in the presence of noise arising from stochastic gradient. It potentially facilitates deep Bayesian learning on large datasets where complex multimodal posteriors and mini-batch gradient are encountered.
Predictive Uncertainty in Large Scale Classification using Dropout - Stochastic Gradient Hamiltonian Monte Carlo
Vergara, Diego, Hernández, Sergio, Valdenegro, Matías, Jorquera, Felipe
Abstract--Predictive uncertainty is crucial for many computer vision tasks, from image classification to autonomous driving systems. Hamiltonian Monte Carlo (HMC) is an inference method for sampling complex posterior distributions. On the other hand, Dropout regularization has been proposed as an approximate model averaging technique that tends to improve generalization in large scale models such as deep neural networks. Although, HMC provides convergence guarantees for most standard Bayesian models, it does not handle discrete parameters arising from Dropout regularization. In this paper, we present a robust methodology for predictive uncertainty in large scale classification problems, based on Dropout and Stochastic Gradient Hamiltonian Monte Carlo. Even though Dropout induces a non-smooth energy function with no such convergence guarantees, the resulting discretization of the Hamiltonian proves empirical success. The proposed method allows to effectively estimate predictive accuracy and to provide better generalization for difficult test examples.